Home:ALL Converter>Spark: master local[*] is a lot slower than master local

Spark: master local[*] is a lot slower than master local

Ask Time:2016-11-09T05:21:34         Author:Sai Wai Maung

Json Formatter

I have an EC2 set up with r3.8xlarge (32 cores, 244G RAM).

In my Spark application, I am reading two csv files from S3 using Spark-CSV from DataBrick, each csv has about 5 millions rows. I am unionAll the two DataFrames and running a dropDuplicates on the combined DataFrame.

But when I have,

 val conf = new SparkConf()
            .setMaster("local[32]")
            .setAppName("Raw Ingestion On Apache Spark")
            .set("spark.sql.shuffle.partitions", "32")

Spark is slower than .setMaster("local")

Wouldn't it be faster with 32 cores?

Author:Sai Wai Maung,eproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/40496626/spark-master-local-is-a-lot-slower-than-master-local
yy